mixed effect model
Functional effects models: Accounting for preference heterogeneity in panel data with machine learning
In this paper, we present a general specification for Functional Effects Models, which use Machine Learning (ML) methodologies to learn individual-specific preference parameters from socio-demographic characteristics, therefore accounting for inter-individual heterogeneity in panel choice data. We identify three specific advantages of the Functional Effects Model over traditional fixed, and random/mixed effects models: (i) by mapping individual-specific effects as a function of socio-demographic variables, we can account for these effects when forecasting choices of previously unobserved individuals (ii) the (approximate) maximum-likelihood estimation of functional effects avoids the incidental parameters problem of the fixed effects model, even when the number of observed choices per individual is small; and (iii) we do not rely on the strong distributional assumptions of the random effects model, which may not match reality. We learn functional intercept and functional slopes with powerful non-linear machine learning regressors for tabular data, namely gradient boosting decision trees and deep neural networks. We validate our proposed methodology on a synthetic experiment and three real-world panel case studies, demonstrating that the Functional Effects Model: (i) can identify the true values of individual-specific effects when the data generation process is known; (ii) outperforms both state-of-the-art ML choice modelling techniques that omit individual heterogeneity in terms of predictive performance, as well as traditional static panel choice models in terms of learning inter-individual heterogeneity. The results indicate that the FI-RUMBoost model, which combines the individual-specific constants of the Functional Effects Model with the complex, non-linear utilities of RUMBoost, performs marginally best on large-scale revealed preference panel data.
Alignment Drift in CEFR-prompted LLMs for Interactive Spanish Tutoring
Almasi, Mina, Kristensen-McLachlan, Ross Deans
This paper investigates the potentials of Large Language Models (LLMs) as adaptive tutors in the context of second-language learning. In particular, we evaluate whether system prompting can reliably constrain LLMs to generate only text appropriate to the student's competence level. We simulate full teacher-student dialogues in Spanish using instruction-tuned, open-source LLMs ranging in size from 7B to 12B parameters. Dialogues are generated by having an LLM alternate between tutor and student roles with separate chat histories. The output from the tutor model is then used to evaluate the effectiveness of CEFR-based prompting to control text difficulty across three proficiency levels (A1, B1, C1). Our findings suggest that while system prompting can be used to constrain model outputs, prompting alone is too brittle for sustained, long-term interactional contexts - a phenomenon we term alignment drift. Our results provide insights into the feasibility of LLMs for personalized, proficiency-aligned adaptive tutors and provide a scalable method for low-cost evaluation of model performance without human participants.
A Comparison of Machine Learning Methods for Data with High-Cardinality Categorical Variables
High-cardinality categorical variables are variables for which the number of different levels is large relative to the sample size of a data set or, equivalently, there is little data per level. Highcardinality categorical variables can pose difficulties for machine learning methods such as deep neural networks and tree-based models. A simple strategy for dealing with categorical variables is to use one-hot encoding or dummy variables. But this approach often does not work well for high-cardinality categorical variables due to the reasons described below. For neural networks, a frequently adopted solution is to use entity embeddings [Guo and Berkhahn, 2016] that map every level of a categorical variable into a low-dimensional Euclidean space. For tree-boosting, an alternative to one-hot encoding is to assign a number to every level of a categorical variable, and then consider this as a one-dimensional numeric variable. Another solution implemented in the LightGBM boosting library [Ke et al., 2017] works by partitioning all levels into two subsets using an approximate approach [Fisher, 1958] when finding splits in the tree-building algorithm. Further, the CatBoost boosting library [Prokhorenkova et al., 2018] implements an approach based on ordered target statistics calculated using random partitions of the training data for handling categorical predictor variables. Random effects [Laird et al., 1982, Pinheiro and Bates, 2006] can also be used as a tool for handling high-cardinality categorical variables.
Mixture of experts models for multilevel data: modelling framework and approximation theory
Fung, Tsz Chai, Tseung, Spark C.
Multilevel data are prevalent in many real-world applications. However, it remains an open research problem to identify and justify a class of models that flexibly capture a wide range of multilevel data. Motivated by the versatility of the mixture of experts (MoE) models in fitting regression data, in this article we extend upon the MoE and study a class of mixed MoE (MMoE) models for multilevel data. Under some regularity conditions, we prove that the MMoE is dense in the space of any continuous mixed effects models in the sense of weak convergence. As a result, the MMoE has a potential to accurately resemble almost all characteristics inherited in multilevel data, including the marginal distributions, dependence structures, regression links, random intercepts and random slopes. In a particular case where the multilevel data is hierarchical, we further show that a nested version of the MMoE universally approximates a broad range of dependence structures of the random effects among different factor levels.
Probabilistic Deep Learning with Probabilistic Neural Networks and Deep Probabilistic Models
Probabilistic deep learning is deep learning that accounts for uncertainty, both model uncertainty and data uncertainty. It is based on the use of probabilistic models and deep neural networks. We distinguish two approaches to probabilistic deep learning: probabilistic neural networks and deep probabilistic models. The former employs deep neural networks that utilize probabilistic layers which can represent and process uncertainty; the latter uses probabilistic models that incorporate deep neural network components which capture complex non-linear stochastic relationships between the random variables. We discuss some major examples of each approach including Bayesian neural networks and mixture density networks (for probabilistic neural networks), and variational autoencoders, deep Gaussian processes and deep mixed effects models (for deep probabilistic models). TensorFlow Probability is a library for probabilistic modeling and inference which can be used for both approaches of probabilistic deep learning. We include its code examples for illustration.
Fast Physical Activity Suggestions: Efficient Hyperparameter Learning in Mobile Health
Menictas, Marianne, Tomkins, Sabina, Murphy, Susan
Users can be supported to adopt healthy behaviors, such as regular physical activity, via relevant and timely suggestions on their mobile devices. Recently, reinforcement learning algorithms have been found to be effective for learning the optimal context under which to provide suggestions. However, these algorithms are not necessarily designed for the constraints posed by mobile health (mHealth) settings, that they be efficient, domain-informed and computationally affordable. We propose an algorithm for providing physical activity suggestions in mHealth settings. Using domain-science, we formulate a contextual bandit algorithm which makes use of a linear mixed effects model. We then introduce a procedure to efficiently perform hyper-parameter updating, using far less computational resources than competing approaches. Not only is our approach computationally efficient, it is also easily implemented with closed form matrix algebraic updates and we show improvements over state of the art approaches both in speed and accuracy of up to 99% and 56% respectively.
Approaches to Linear Mixed Effects Models with Sign Constraints
Chen, Hao, Han, Lanshan, Lim, Alvin
Linear Mixed Effects (LME) models have been widely applied in clustered data analysis in many areas including marketing research, clinical trials, and biomedical studies. Inference can be conducted using maximum likelihood approach if assuming Normal distributions on the random effects. However, in many applications of economy, business and medicine, it is often essential to impose constraints on the regression parameters after taking their real-world interpretations into account. Therefore, in this paper we extend the unconstrained LME models to allow for sign constraints on its overall coefficients. We propose to assume a symmetric doubly truncated Normal (SDTN) distribution on the random effects instead of the unconstrained Normal distribution which is often found in classical literature. With the aforementioned change, difficulty has dramatically increased as the exact distribution of the dependent variable becomes analytically intractable. We then develop likelihood-based approaches to estimate the unknown model parameters utilizing the approximation of its exact distribution. Hypothesis testing under the new model specification is also discussed and studied empirically. Simulation studies have shown that the proposed constrained model not only improves real-world interpretations of results, but also achieves satisfactory performance on model fits as compared to the existing model.
A Bayesian model for a simulated meta-analysis
There are multiple ways to estimate a Stan model in R, but I choose to build the Stan code directly rather than using the brms or rstanarm packages. In the Stan code, we need to define the data structure, specify the parameters, specify any transformed parameters (which are just a function of the parameters), and then build the model – which includes laying out the prior distributions as well as the likelihood. In this case, the model is slightly different from what was presented in the context of a mixed effects model. The key difference is that there are prior distributions on \(\Delta\) and \(\tau\), introducing an additional level of uncertainty into the estimate. I would expect that the estimate of the overall treatment effect \(\Delta\) will have a wider 95% CI (credible interval in this context) than the 95% CI (confidence interval) for \(\delta_0\) in the mixed effects model.
Gaussian Process Boosting
In this article, we propose a novel way to combine boosting with Gaussian process and mixed effects models. Boosting [Freund and Schapire, 1996, Breiman, 1998, Friedman et al., 2000, Mason et al., 2000, Friedman, 2001, Bühlmann and Hothorn, 2007] is a machine learning technique that achieves superior predictive performance for a large variety of datasets [Chen and Guestrin, 2016, Nielsen, 2016]. Apart from this, the wide adoption of treeboosting in applied machine learning and data science is due to several advantages: boosting with trees as base learners can automatically account for complex non-linearities, discontinuities, and high-order interactions, it is robust to outliers in and multicollinearity among predictor variables, it is scale-invariant to monotone transformations of the predictor variables, it can handle missing values in predictor variables automatically by loosing almost no information [Elith et al., 2008], and boosting can perform variable selection. Gaussian processes [Williams and Rasmussen, 2006], on the other hand, are flexible nonparametric function models that achieve state-of-the-art predictive accuracy and allow for making probabilistic predictions [Gneiting et al., 2007]. Gaussian process and mixed effects models are used, for instance, for nonparametric regression, modeling of time series [Shumway and Stoffer, 2017], spatial [Banerjee et al., 2014], spatiotemporal [Cressie and Wikle, 2015], panel or longitudinal, and hierarchically clustered or grouped
Streamlined Empirical Bayes Fitting of Linear Mixed Models in Mobile Health
Menictas, Marianne, Tomkins, Sabina, Murphy, Susan A
To effect behavior change a successful algorithm must make high-quality decisions in real-time. For example, a mobile health (mHealth) application designed to increase physical activity must make contextually relevant suggestions to motivate users. While machine learning offers solutions for certain stylized settings, such as when batch data can be processed offline, there is a dearth of approaches which can deliver high-quality solutions under the specific constraints of mHealth. We propose an algorithm which provides users with contextualized and personalized physical activity suggestions. This algorithm is able to overcome a challenge critical to mHealth that complex models be trained efficiently. We propose a tractable streamlined empirical Bayes procedure which fits linear mixed effects models in large-data settings. Our procedure takes advantage of sparsity introduced by hierarchical random effects to efficiently learn the posterior distribution of a linear mixed effects model. A key contribution of this work is that we provide explicit updates in order to learn both fixed effects, random effects and hyper-parameter values. We demonstrate the success of this approach in a mobile health (mHealth) reinforcement learning application, a domain in which fast computations are crucial for real time interventions. Not only is our approach computationally efficient, it is also easily implemented with closed form matrix algebraic updates and we show improvements over state of the art approaches both in speed and accuracy of up to 99% and 56% respectively.